Documents to Data (Text Processing)

Synopsis

Generates a data set from documents.

Description

This operator generates a data set from a collection of documents. For each document in the collection, an example is added to the data set. The text contained in the document is stored in a nominal attribute. If a label or meta data are present associated with the documents, a label attribute or attribute for the meta data are created, respectively.

Input

  • documents (Collection)

    The documents port.

Output

  • example set (Data table)

    The example set port.

Parameters

  • text attributeThe name of the text attribute.
  • label attributeThe name of the label attribute.
  • add meta informationIf checked, available meta information of the text like filename, date is added as attribute.
  • datamanagementDetermines, how the data is represented internally.